Skip to content

managed file events#228

Open
DoyleDev wants to merge 4 commits into
databricks:mainfrom
DoyleDev:managed-file-events
Open

managed file events#228
DoyleDev wants to merge 4 commits into
databricks:mainfrom
DoyleDev:managed-file-events

Conversation

@DoyleDev
Copy link
Copy Markdown

@DoyleDev DoyleDev commented Feb 5, 2026

Creating a module and example of how to create the necessary IAM role, policy, external location, and storage credential resources needed for managed file events.

@DoyleDev DoyleDev requested review from a team as code owners February 5, 2026 21:55
@DoyleDev DoyleDev requested a review from rauchy February 5, 2026 21:55
Comment thread examples/aws-managed-file-events/terraform.tfvars
Comment thread modules/aws-managed-file-events/iam.tf Outdated
@DoyleDev DoyleDev requested a review from alexott February 6, 2026 15:38
@rauchy rauchy removed their request for review February 20, 2026 11:08
@alexott alexott requested a review from Copilot May 12, 2026 12:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Terraform module and a corresponding example to provision AWS + Databricks Unity Catalog resources required to use Databricks Managed File Events (managed SQS + file-notification mode) with Auto Loader.

Changes:

  • Introduces modules/aws-managed-file-events to create/use an S3 bucket, create the UC IAM role/policy, and provision a storage credential + external location (optionally a catalog) with file events enabled.
  • Adds examples/aws-managed-file-events demonstrating how to call the module and configure providers/inputs.
  • Adds module/example documentation plus terraform-docs Makefile targets.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
modules/aws-managed-file-events/versions.tf New module provider requirements (missing required_version).
modules/aws-managed-file-events/variables.tf Module inputs/locals (includes unused required vars; missing conditional validations).
modules/aws-managed-file-events/s3.tf Optional S3 bucket creation + encryption/public access block, or data source for existing bucket.
modules/aws-managed-file-events/iam.tf IAM role/policy creation using Databricks UC policy/assume-role policy data sources.
modules/aws-managed-file-events/main.tf Creates storage credential, external location with managed file events, and grants.
modules/aws-managed-file-events/catalog.tf Optional catalog creation + grants (force-destroy wired to bucket flag).
modules/aws-managed-file-events/outputs.tf Exposes bucket, IAM role, storage credential, external location, and optional catalog outputs.
modules/aws-managed-file-events/README.md New module docs + usage snippets (contains a few copy/paste issues).
modules/aws-managed-file-events/Makefile terraform-docs helper targets for the module.
examples/aws-managed-file-events/versions.tf Example provider requirements (missing required_version).
examples/aws-managed-file-events/providers.tf Example AWS + Databricks provider configuration (PAT var currently unused).
examples/aws-managed-file-events/variables.tf Example input variables (includes unused/misdescribed PAT variable).
examples/aws-managed-file-events/main.tf Invokes the new module.
examples/aws-managed-file-events/outputs.tf Example outputs exposing module outputs.
examples/aws-managed-file-events/README.md Example instructions + code snippets (some inaccuracies).
examples/aws-managed-file-events/terraform.tfvars Sample tfvars (includes secret-looking placeholders).
examples/aws-managed-file-events/Makefile terraform-docs helper targets for the example.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +6
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.0"
}
Comment on lines +1 to +5
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 5.0"
Comment on lines +12 to +25
variable "region" {
type = string
description = "(Required) AWS region where the assets will be deployed"
}

variable "aws_account_id" {
type = string
description = "(Required) AWS account ID where the IAM role will be created"
}

variable "databricks_account_id" {
type = string
description = "(Required) Databricks Account ID"
}
variable "existing_bucket_name" {
type = string
description = "(Optional) Name of existing S3 bucket when create_bucket is false"
default = null
variable "catalog_name" {
type = string
description = "(Optional) Name for the catalog. Required if create_catalog is true"
default = null
Comment on lines +103 to +112
Or in Lakeflow Declarative Pipelines:

```python
@dlt.table
def my_table():
return spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", "json") \
.option("cloudFiles.useManagedFileEvents", "true") \
.load("s3://bucket/path")
```
Comment on lines +48 to +55


variable "databricks_pat_token" {
type = string
sensitive = true
description = "(Required) Databricks service principal client secret"
}

Comment on lines +5 to +9
databricks_account_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
databricks_host = "https://my-workspace.cloud.databricks.com"
databricks_pat_token = "dapixxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
databricks_client_id = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
databricks_client_secret = "dosexxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
Comment on lines +94 to +105
Or in Lakeflow Declarative Pipelines:

```python
from pyspark import pipelines as dp

@dp.table
def my_table():
return spark.readStream.format("cloudFiles") \
.option("cloudFiles.format", "json") \
.option("cloudFiles.useManagedFileEvents", "true") \
.load("/Volumes") # Ingesting from a volume that points to your S3 bucket will be more performant than the S3 location itself.
```
2. Add a `variables.tf` with the same content in [variables.tf](variables.tf)
3. Add a `terraform.tfvars` file and provide values to each defined variable
4. Configure authentication to your Databricks workspace and AWS account
5. Add a `output.tf` file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants